Skip to content

ssl: fix handshake hang that manifested on OS X.#2624

Merged
htuch merged 4 commits intoenvoyproxy:masterfrom
htuch:ssl-handshake-write
Feb 15, 2018
Merged

ssl: fix handshake hang that manifested on OS X.#2624
htuch merged 4 commits intoenvoyproxy:masterfrom
htuch:ssl-handshake-write

Conversation

@htuch
Copy link
Member

@htuch htuch commented Feb 15, 2018

When a handshake completed in transport socket doRead(), we didn't check
to see if there were any pending bytes in the write buffer in
Network::ConnectionImpl. This showed up in #2598, where
ads_integration_test would hang while waiting for the server to connect
to the fake management upstream.

Risk Level: Medium (messing around with connection handshaking can do
bad things).
Testing: Additional unit test for ConnectionImpl changes. Validated
ads_integration_test now passes on OS X.

Signed-off-by: Harvey Tuch htuch@google.com

When a handshake completed in transport socket doRead(), we didn't check
to see if there were any pending bytes in the write buffer in
Network::ConnectionImpl. This showed up in envoyproxy#2598, where
ads_integration_test would hang while waiting for the server to connect
to the fake management upstream.

Risk Level: Medium (messing around with connection handshaking can do
bad things).
Testing: Additional unit test for ConnectionImpl changes. Validated
ads_integration_test now passes on OS X.

Signed-off-by: Harvey Tuch <htuch@google.com>
// completion, which may also have completed in the context of onReadReady(),
// where no check of the write buffer is made. Provide an opportunity to flush
// here. If connection write is not ready, this is harmless.
if (event == ConnectionEvent::Connected && write_buffer_->length() > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this happen before notifying callbacks?

If we leave this after the above loop, should check whether the connection was closed before doing this.

Buffer::OwnedImpl buffer(val);
EXPECT_CALL(*file_event_, activate(Event::FileReadyType::Write)).WillOnce(Invoke(file_ready_cb_));
EXPECT_CALL(*transport_socket_, doWrite(BufferStringEqual(val), false))
.WillOnce(Return(IoResult{PostIoAction::KeepOpen, 100, false}));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should return 0 instead of 100; otherwise you're claiming that 100 bytes were written, but this test scenario is saying that it cannot be written yet (and 100 bytes is more than the size of the write, so it's weird).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please return 0 on this one

// in the write buffer, we should see a doWrite with this data.
EXPECT_CALL(*transport_socket_, doRead(_)).WillOnce(InvokeWithoutArgs([this] {
transport_socket_callbacks_->raiseEvent(Network::ConnectionEvent::Connected);
return IoResult{PostIoAction::KeepOpen, 100, false};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you're going to return 100, you should put 100 bytes in the buffer. Otherwise it's an API violation.

}));
EXPECT_CALL(*transport_socket_, doWrite(BufferStringEqual(val), false))
.WillOnce(Return(IoResult{PostIoAction::KeepOpen, 100, false}));
EXPECT_CALL(*transport_socket_, doWrite(_, true))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What causes this call? Is this from shutdown of the test? If so, let's put this EXPECT after the file_ready_cb_ below.

Signed-off-by: Harvey Tuch <htuch@google.com>
// where no check of the write buffer is made. Provide an opportunity to flush
// here. If connection write is not ready, this is harmless.
if (event == ConnectionEvent::Connected && write_buffer_->length() > 0) {
onWriteReady();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

onWriteReady() can close the socket. Need to check that before continuing. Although, if that happens, filters will get notified of a close before being notified that it was connected.

Hmmm... Thinking this through, it is probably better to notify all filters that they're connected (ie put it back to the way it was before), and make this case be:

if (state() == State::Open && event == ConnectionEvent::Connected && write_buffer_->length() > 0)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to go in circles on this; ConnectionImpl is complicated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this is an area of the code I'm less familiar with, and it's hard to reason about what callbacks can and can't do here, so thanks for the patience.

Signed-off-by: Harvey Tuch <htuch@google.com>
Copy link
Member

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 more small change, then I think this is good to go

Buffer::OwnedImpl buffer(val);
EXPECT_CALL(*file_event_, activate(Event::FileReadyType::Write)).WillOnce(Invoke(file_ready_cb_));
EXPECT_CALL(*transport_socket_, doWrite(BufferStringEqual(val), false))
.WillOnce(Return(IoResult{PostIoAction::KeepOpen, 100, false}));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please return 0 on this one

Signed-off-by: Harvey Tuch <htuch@google.com>
@htuch htuch merged commit df8007e into envoyproxy:master Feb 15, 2018
@htuch htuch deleted the ssl-handshake-write branch February 15, 2018 20:07
lita pushed a commit to lita/envoy that referenced this pull request Feb 15, 2018
// where no check of the write buffer is made. Provide an opportunity to flush
// here. If connection write is not ready, this is harmless. We should only do
// this if we're still open (the above callbacks may have closed).
if (state() == State::Open && event == ConnectionEvent::Connected &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch @ggreenway I'm late here, but I think it's slightly strange that in the plaintext case we effectively recurse into onWriteReady() which already handles this case by trying to write after a connection happens. Did we consider any other variant of this change? It's not a big deal just curious.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is weird; I didn't catch that. No, we didn't try anything else. We should look into if there's a better way to handle this. It seems to work, but it is strange, and hard to reason about.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works because we flip connecting_ status before re-entry. I agree this is harder to reason about than would be ideal. Should we just return unconditionally in raw buffer socket onWriteReady() following a new connection and defer to the new write check?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch agreed the code works as is. To make it clearer yes I would either return unconditionally with a comment here https://github.com/envoyproxy/envoy/blob/master/source/common/network/connection_impl.cc#L477 or another option is to factor out the 2nd part of onWriteReady() into a separate function and call that in the SSL case when handshake completes during a read callback. I think the former is simpler IMO.

Shikugawa pushed a commit to Shikugawa/envoy that referenced this pull request Mar 28, 2020
* feat(stats): support grpc status codes in metrics

* wip

* add tests and fix up context

* set empty grpc_response_code

* use latest envoyproxy/envoy-wasm

Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>

* add license/copyright banner

Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>

* fix lint / format / malign issues

Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>

* fix up alpn_test.cc

Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>

* fix lint

Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>

* more tests needed updating with envoy update

Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>

* stackdriver fix

Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>

* fix stackdriver onConfigure

Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>

* remove unused using clause

Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>

* clang-format

Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>
jpsim added a commit that referenced this pull request Nov 28, 2022
Signed-off-by: JP Simard <jp@jpsim.com>
jpsim added a commit that referenced this pull request Nov 29, 2022
Signed-off-by: JP Simard <jp@jpsim.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants